A Corpus-based Chinese Syllable-to-Character System

نویسندگان

  • Chien-Pang Wang
  • Tyne Liang
چکیده

One of the popular input systems is based on Chinese phonetic symbols. Designing such kind of a syllable-to-character (STC) input system involves two major issues, namely, fault tolerance handling and homonym resolution. In this paper, the fault tolerance mechanism is constructed on the basis of a user-defined confusing set and a modified bucket indexing scheme is incorporated so as to satisfy real-time requirement. Meanwhile the homonym resolution is handled by binding force and heuristic selection rules. Both the system performance and tolerance ability are justified with real corpus in terms of searching speed and character conversion accuracy rate. Experimental results show that the proposed scheme can achieve 93.54% accuracy for zero-error syllable inputs and 80.13% for zero-tone syllable inputs. Furthermore both robustness and tolerance of the proposed system are proved for high input error rates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus-Based Adaptation Mechanisms for Chinese Homophone Disambiguation

Based on the concepts of bzd~rectwnal converswn and automahc evaluatzon, we propose two user. adaptation mechanzsms, character-preference learn. in9 and pseudo-word learning, for resolving Chinese homophone ambiguities in syllable-to.character conversion. The 1991 Umted Daily corpus of approximately 10 million Chinese characters ts used for extraction of 10 reporter-specific article databases a...

متن کامل

Chinese Input Method Based On Reduce

In this paper we study the problem of simplifying Chinese input method and making it suitable for use with mobile devices. To see the feasibility of aggressively reducing the number of keystrokes per Chinese character, we compare three input modes: character-based, syllable-based and first-symbol-based. Specifically, we use these linguistic units as token types and compare the perplexities. Wit...

متن کامل

Syllable-based Machine Transliteration with Extra Phrase Features

This paper describes our syllable-based phrase transliteration system for the NEWS 2012 shared task on English-Chinese track and its back. Grapheme-based Transliteration maps the character(s) in the source side to the target character(s) directly. However, character-based segmentation on English side will cause ambiguity in alignment step. In this paper we utilize Phrase-based model to solve ma...

متن کامل

Multi-Scale TextTiling for Automatic Story Segmentation in Chinese Broadcast News

This paper applies Chinese subword representations, namely character and syllable n-grams, into the TextTiling-based automatic story segmentation of Chinese broadcast news. We show the robustness of Chinese subwords against speech recognition errors, out-of-vocabulary (OOV) words and versatility in word segmentation in lexical matching on errorful Chinese speech recognition transcripts. We prop...

متن کامل

Towards a Chinese text-to-speech system with higher naturalness

This paper presents our research efforts on Chinese text-tospeech towards higher naturalness, the main results can be summarized as follows: 1. In the proposed TTS system the syllable-sized units were cut out from the real recorded speech, the synthetic speech was generated by concatenating these units back together. 2. The integration of units synthesized by rules with natural units was tested...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003